Data Analysis on FAANG Stocks from 2013 to 2020

CMSC 320 Final Project Tutorial

Authors: Harshit Raj | Vaibhav Khetan | Yash Kalyani

FAANG LOGOS

Project Description -

This project is a way for users to analyze previous and current stock data of the FAANG companies. In order to carry out a predictive, comparative and quantitative analysis of this data we have extracted relevant information from '.csv' files that have the required stock market information. We have imported various packages to help us carry out different functions which will make analyzing data in a more efficient way. We read the '.csv' files using pandas and extracted information into tables by splitting up the data as per the comma delimiters. Since, the FAANG companies had their IPOs established at different times which is why for the sake of consistency we chose to graph plots only after January, 2013. In this project we also saved the stock data in the form of a pandas dataframe, from these dataframes we pulled out the information that will be relevant in making graphs which can depict important stock information regarding the concerned company. Moreover, we have also tried to carry out hypothesis testing in order to check if all the data fits the line plots and scatter plots that have been made.

Dataset -

We used Kaggle - a popular online website which gives users access to thousands of public datasets. One user Aayush Mishra had uploaded a dataset with all FAANG stock ticks with the features - Date, Open, Close, High, Low, Adj Close and Volume - in a dataset named FAANG- Complete Stock Data. The link for this dataset is provided below:

https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data

The dataset is realiable and we used Yahoo Stocks to verify that the given data's ticks matched perfectly with the actual data recorded by Yahoo Stocks.

STONKS

Motivation -

Today, stocks are extremely important; they play an integral role in determining the growth of a company and can also shape an individuals wealth. For a very long time people have been investing their money in buying shares of a company and also trading stocks. People exist, who have made a fortune by investing in companies that have grown exponentially in the last few years. However, stocks have had erratic patterns throughout the years, they have suffered during recessions and have also had setbacks due to certain internal or even external factors. Our group decided to work on some kind of stock analysis that would help us and other readers understand more about the stock market, and also be able to visualize the various attributes about a particular stock at a particular time. We also wanted to examine the change in trend in stocks during recessions (2001, 2008) and compare them to the trends today amidst a worldwide pandemic.

Usage of Interactive Plots -

We used Plotly - a python tool that allows us to create interactive plots. Some of the functionality is given below:

1) Hovering over any data allows a user to see what is contained in that datapoint on the plot.

2) Click on Compare Data On Hover to compare multiple datapoints on the same plot.

3) Drop Down menus can be used to toggle functionality/visualizations on the plots.

For More Information on Plotly, go to the following link:

PLOTLY FOR PYTHON

About Imports -

Here, we have imported all required packages and modules that python has to offer and those that will be useful in the project. We are primarily using pandas and pyplot from matplotlib. The functions in these modules provide us with functionality and accessibility that makes our code compact and also helps us to add interactive graphs making the final product visual and understandable by readers.

In [55]:
!pip3 install plotly==4.14.1
Requirement already satisfied: plotly==4.14.1 in /opt/conda/lib/python3.8/site-packages (4.14.1)
Requirement already satisfied: retrying>=1.3.3 in /opt/conda/lib/python3.8/site-packages (from plotly==4.14.1) (1.3.3)
Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from plotly==4.14.1) (1.15.0)
In [2]:
import json
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=False)
In [3]:
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import sklearn
import math
from datetime import datetime, date
from sklearn import preprocessing
from sklearn import datasets
from sklearn import utils
from sklearn import linear_model
from sklearn.metrics import *
from sklearn.preprocessing import *
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split

Data Collection -

In this part, we add all our collected data to our ipynb file which can then be processed and analyzed.

In this cell below, we have initialized DataFrames for each of the FAANG companies. The files that hold the stock data for each of these companies are ".csv" files which means that the values that we need are stored as comma separated values. What we have done here is create a pandas dataframe for the stock data for each of these companies by reading from their respective ".csv" files.

In [4]:
facebook = pd.read_csv("data/Facebook.csv", sep=',')
apple = pd.read_csv("data/Apple.csv", sep=',')
amazon = pd.read_csv("data/Amazon.csv", sep=',')
netflix = pd.read_csv("data/Netflix.csv", sep=',')
google = pd.read_csv("data/Google.csv", sep=',')

Data Processing -

In this part, we process the data, also called Data Cleaning. We change the data time range and format it to our requirements.

When we read the .csv files above the DataFrame had columns where the date of the particular stock was stored in the form of a String. In a pandas dataframe if the date is in the form of a string then python will not be able to use it in graphs or any sort of computations. Hence, we have converted the entire Date column in the DataFrame from a Python String to a Python Datetime object that can be used to create graphs and use these Dates for comparisons and for understanding stock trends on a particular day or range of dates.

Formatting Dates to Datetime:

In [5]:
facebook['Date'] = pd.to_datetime(facebook['Date'])
apple['Date'] = pd.to_datetime(apple['Date'])
amazon['Date'] = pd.to_datetime(amazon['Date'])
netflix['Date'] = pd.to_datetime(netflix['Date'])
google['Date'] = pd.to_datetime(google['Date'])

Cleaning Data:

Here we start setting up interactive graphs that will depict different, relevant information about stocks. Below, you will notice that we have taken only those rows in each DataFrame that are greater than 2012. We have done this because almost all the companies in FAANG had very different IPO dates. IPOs stand for Initial Public Offerings, which basically means that the stocks of these companies were now open to be bought by individuals.

In order to graph this DataFrame, we first changed each DataFrame to consist of those stock trends from Jan, 2013 to Aug, 2020. After modifying these DataFrames we dropped the index column, thus, making these DataFrames ready to plot.

Picking Our Date Ranges:

In [6]:
facebook = facebook[(facebook['Date'].dt.year > 2012) & (facebook['Date'].dt.year < 2021)]
apple = apple[(apple['Date'].dt.year > 2012) & (apple['Date'].dt.year < 2021)]
amazon = amazon[(amazon['Date'].dt.year > 2012) & (amazon['Date'].dt.year < 2021)]
netflix = netflix[(netflix['Date'].dt.year > 2012) & (netflix['Date'].dt.year < 2021)]
google = google[(google['Date'].dt.year > 2012) & (google['Date'].dt.year < 2021)]

facebook = facebook.reset_index(drop=True)
apple = apple.reset_index(drop=True)
amazon = amazon.reset_index(drop=True)
netflix = netflix.reset_index(drop=True)
google = google.reset_index(drop=True)

facebook
Out[6]:
Date Open High Low Close Adj Close Volume
0 2013-01-02 27.440001 28.180000 27.420000 28.000000 28.000000 69846400
1 2013-01-03 27.879999 28.469999 27.590000 27.770000 27.770000 63140600
2 2013-01-04 28.010000 28.930000 27.830000 28.760000 28.760000 72715400
3 2013-01-07 28.690001 29.790001 28.650000 29.420000 29.420000 83781800
4 2013-01-08 29.510000 29.600000 28.860001 29.059999 29.059999 45871300
... ... ... ... ... ... ... ...
1916 2020-08-12 258.970001 263.899994 258.109985 259.890015 259.890015 21428300
1917 2020-08-13 261.549988 265.160004 259.570007 261.299988 261.299988 17374000
1918 2020-08-14 262.309998 262.649994 258.679993 261.239990 261.239990 14792700
1919 2020-08-17 262.500000 264.100006 259.399994 261.160004 261.160004 13351100
1920 2020-08-18 260.950012 265.149994 259.260010 262.339996 262.339996 18677500

1921 rows × 7 columns

Exploratory Analysis and Data Visualization -

Correlation Plots for Features of Individual Companies :

The following plots visually attactive and interactive as well. This matrix - like graph has the various stock attributes like open price, close price and volume listed on the left and bottom of the matrix. We have calculated the different correlations by using the .pct_change() function provided to us. Hence, we created a new dataframe that was responsible for storing these values which will then be used to make a matrix plot that will then be filled with colors which determine the correlation between two different companies and their stock attributes. This correlation helps users to understand how a change in different attributes at one company can be noticed in another company in the same sector. We have provided a legend that will allow users to read this graph more efficiently and also understand it better.

In [7]:
corr_df_fb = facebook[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_fb = corr_df_fb.pct_change()

corr_fb = retscomp_fb.corr()
corr_fb
Out[7]:
Open Close High Low Adj Close Volume
Open 1.000000 0.401758 0.769315 0.758697 0.401758 0.016311
Close 0.401758 1.000000 0.747093 0.732999 1.000000 0.007707
High 0.769315 0.747093 1.000000 0.790089 0.747093 0.192635
Low 0.758697 0.732999 0.790089 1.000000 0.732999 -0.178854
Adj Close 0.401758 1.000000 0.747093 0.732999 1.000000 0.007707
Volume 0.016311 0.007707 0.192635 -0.178854 0.007707 1.000000
In [8]:
fig = px.imshow(corr_fb)

fig.update_layout(title='Correlation between Features of Facebook Stock')

iplot(fig,show_link=False)
In [9]:
corr_df_ap = apple[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_ap = corr_df_ap.pct_change()

corr_ap = retscomp_ap.corr()
corr_ap
Out[9]:
Open Close High Low Adj Close Volume
Open 1.000000 0.413942 0.751016 0.768682 0.414086 -0.037703
Close 0.413942 1.000000 0.742652 0.735377 0.999469 -0.106651
High 0.751016 0.742652 1.000000 0.775474 0.741752 0.113300
Low 0.768682 0.735377 0.775474 1.000000 0.734981 -0.264364
Adj Close 0.414086 0.999469 0.741752 0.734981 1.000000 -0.107836
Volume -0.037703 -0.106651 0.113300 -0.264364 -0.107836 1.000000
In [10]:
fig = px.imshow(corr_ap)

fig.update_layout(title='Correlation between Features of Apple Stock')

iplot(fig,show_link=False)
In [11]:
corr_df_am = amazon[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_am = corr_df_am.pct_change()

corr_am = retscomp_am.corr()
corr_am
Out[11]:
Open Close High Low Adj Close Volume
Open 1.000000 0.423578 0.786375 0.747709 0.423578 0.044092
Close 0.423578 1.000000 0.747264 0.757656 1.000000 0.058062
High 0.786375 0.747264 1.000000 0.787892 0.747264 0.236255
Low 0.747709 0.757656 0.787892 1.000000 0.757656 -0.127467
Adj Close 0.423578 1.000000 0.747264 0.757656 1.000000 0.058062
Volume 0.044092 0.058062 0.236255 -0.127467 0.058062 1.000000
In [12]:
fig = px.imshow(corr_am)

fig.update_layout(title='Correlation between Features of Amazon Stock')

iplot(fig,show_link=False)
In [13]:
corr_df_ne = netflix[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_ne = corr_df_ne.pct_change()

corr_ne = retscomp_ne.corr()
corr_ne
Out[13]:
Open Close High Low Adj Close Volume
Open 1.000000 0.425437 0.749779 0.784565 0.425437 0.025012
Close 0.425437 1.000000 0.763005 0.728188 1.000000 0.123859
High 0.749779 0.763005 1.000000 0.774663 0.763005 0.278367
Low 0.784565 0.728188 0.774663 1.000000 0.728188 -0.109167
Adj Close 0.425437 1.000000 0.763005 0.728188 1.000000 0.123859
Volume 0.025012 0.123859 0.278367 -0.109167 0.123859 1.000000
In [14]:
fig = px.imshow(corr_ne)

fig.update_layout(title='Correlation between Features of Netflix Stock')

iplot(fig,show_link=False)
In [15]:
corr_df_go = google[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_go = corr_df_go.pct_change()

corr_go = retscomp_go.corr()
corr_go
Out[15]:
Open Close High Low Adj Close Volume
Open 1.000000 0.384602 0.766185 0.726269 0.384602 0.025639
Close 0.384602 1.000000 0.724499 0.745961 1.000000 0.012011
High 0.766185 0.724499 1.000000 0.800660 0.724499 0.178618
Low 0.726269 0.745961 0.800660 1.000000 0.745961 -0.129019
Adj Close 0.384602 1.000000 0.724499 0.745961 1.000000 0.012011
Volume 0.025639 0.012011 0.178618 -0.129019 0.012011 1.000000
In [16]:
fig = px.imshow(corr_go)

fig.update_layout(title='Correlation between Features of Google Stock')

iplot(fig,show_link=False)

Creating Dataframe of All Companies for All Stock Ticks:

In [17]:
facebook['Company'] = ['Facebook']*len(facebook)
apple['Company'] = ['Apple']*len(apple)
amazon['Company'] = ['Amazon']*len(amazon)
netflix['Company'] = ['Netflix']*len(netflix)
google['Company'] = ['Google']*len(google)

frames = [facebook, apple, amazon, netflix, google]

result = pd.concat(frames)

Modifying Volume by Calculating Mean Volume per Year and Standardizing the Volume:

In [18]:
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])

for x, rows in result.iterrows():
    result.loc[x, 'Year'] = rows['Date'].year

comp = result.groupby(['Company', 'Year'])

vol_df = pd.DataFrame()
vol = []
company = []
year = []

x = 0

for key,val in comp:
    a,b = key
    company.append(a)
    year.append(b)
    vol.append(comp.get_group(key).mean()['Volume'])

vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol

fig = go.Figure()

avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()

vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)

for x, rows in vol_df.iterrows():
    vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)

Standarizing Close Price:

In [19]:
avg_close = result.groupby('Date')['Close'].mean()
stand_close = result.groupby('Date')['Close'].std()

stand_close = stand_close.reset_index()
avg_close = avg_close.reset_index()

result['standard_close'] = np.arange(len(result.index))
result = result.reset_index(drop=True)

for x, rows in result.iterrows():
    result.loc[x, 'standard_close'] = (rows['Close'] - avg_close[avg_close['Date'] == rows['Date']]['Close']).values/(stand_close[stand_close['Date'] == rows['Date']]['Close']).values
    
result
Out[19]:
Date Open High Low Close Adj Close Volume Company Year standard_close
0 2013-01-02 27.440001 28.180000 27.420000 28.000000 28.000000 69846400.0 Facebook 2013 -0.663215
1 2013-01-03 27.879999 28.469999 27.590000 27.770000 27.770000 63140600.0 Facebook 2013 -0.665516
2 2013-01-04 28.010000 28.930000 27.830000 28.760000 28.760000 72715400.0 Facebook 2013 -0.659137
3 2013-01-07 28.690001 29.790001 28.650000 29.420000 29.420000 83781800.0 Facebook 2013 -0.661599
4 2013-01-08 29.510000 29.600000 28.860001 29.059999 29.059999 45871300.0 Facebook 2013 -0.661829
... ... ... ... ... ... ... ... ... ... ...
9610 2020-08-31 1643.569946 1644.500000 1625.329956 1629.530029 1629.530029 1321100.0 Google 2020 0.707107
9611 2020-09-01 1632.160034 1659.219971 1629.530029 1655.079956 1655.079956 1133800.0 Google 2020 0.707107
9612 2020-09-02 1668.010010 1726.099976 1660.189941 1717.390015 1717.390015 2476100.0 Google 2020 NaN
9613 2020-09-03 1699.520020 1700.000000 1607.709961 1629.510010 1629.510010 3180200.0 Google 2020 NaN
9614 2020-09-04 1609.000000 1634.989990 1537.970093 1581.209961 1581.209961 2792533.0 Google 2020 NaN

9615 rows × 10 columns

Open, Close, Volume and Moving Averages:

In this section we have made five different graphs that will represent five different attributes about each and every FAANG company. These attributes are : Opening, Closing Prices, Volumes and 14, 21, 100 day moving averages. All of these are very important for investors, they are able to determine whether they should buy or sell a stock based on the values of these attributes. We decided to calculate the different moving averages because a lot of the buyers and sellers base their actions on these averages.

We have added another feature to the graph that shows the standardized volumes in the background of the primary scatterplot. The volumes have been scaled in order to help users see the volumes better. We have also added an option in the dropdown menu where users can choose to see the standardized volume histogram in much more detail.

As you can see the graphs in this section are all interactive and visual. We have made a separate plot for all attributes and a user can select which graph he or she wants to study based on their preference.

Open, Close, Volume and Moving Averages for Facebook:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Facebook. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Facebook stocks.

In [20]:
avg_14 = facebook.Close.rolling(window=14, min_periods=1).mean()
avg_21 = facebook.Close.rolling(window=21, min_periods=1).mean()
avg_100 = facebook.Close.rolling(window=100, min_periods=1).mean()
In [21]:
x_fb = facebook['Date']
y_fb = facebook['Open']
z_fb = facebook['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_fb, y=y_fb, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=z_fb, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean']/200000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Facebook from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Apple:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Apple. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Apple stocks.

In [22]:
avg_14 = apple.Close.rolling(window=14, min_periods=1).mean()
avg_21 = apple.Close.rolling(window=21, min_periods=1).mean()
avg_100 = apple.Close.rolling(window=100, min_periods=1).mean()
In [23]:
x_ap = apple['Date']
y_ap = apple['Open']
z_ap = apple['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_ap, y=y_ap, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ap, y=z_ap, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean']/3500000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Apple from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Amazon:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Amazon. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Amazon stocks.

In [24]:
avg_14 = amazon.Close.rolling(window=14, min_periods=1).mean()
avg_21 = amazon.Close.rolling(window=21, min_periods=1).mean()
avg_100 = amazon.Close.rolling(window=100, min_periods=1).mean()
In [25]:
x_am = amazon['Date']
y_am = amazon['Open']
z_am = amazon['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_am, y=y_am, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_am, y=z_am, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean']/2000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Amazon from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Netflix:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Netflix. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Netflix stocks.

In [26]:
avg_14 = netflix.Close.rolling(window=14, min_periods=1).mean()
avg_21 = netflix.Close.rolling(window=21, min_periods=1).mean()
avg_100 = netflix.Close.rolling(window=100, min_periods=1).mean()
In [27]:
x_ne = netflix['Date']
y_ne = netflix['Open']
z_ne = netflix['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_ne, y=y_ne, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ne, y=z_ne, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean']/50000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Netflix from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Google:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Google. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Google stocks.

In [28]:
avg_14 = google.Close.rolling(window=14, min_periods=1).mean()
avg_21 = google.Close.rolling(window=21, min_periods=1).mean()
avg_100 = google.Close.rolling(window=100, min_periods=1).mean()
In [29]:
x_go = google['Date']
y_go = google['Open']
z_go = google['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_go, y=y_go, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_go, y=z_go, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Google']['Volume Mean']/2000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Google']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Google from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Calculate the Correlation Between Companies and their Stocks:

In [30]:
df_corr = pd.DataFrame()

df_corr['Facebook'] = facebook['Close']
df_corr['Apple'] = apple['Close']
df_corr['Amazon'] = amazon['Close']
df_corr['Netflix'] = netflix['Close']
df_corr['Google'] = google['Close']

retscomp = df_corr.pct_change()

corr = retscomp.corr()
corr
Out[30]:
Facebook Apple Amazon Netflix Google
Facebook 1.000000 0.444546 0.505884 0.345712 0.562611
Apple 0.444546 1.000000 0.431872 0.250707 0.522914
Amazon 0.505884 0.431872 1.000000 0.439284 0.601770
Netflix 0.345712 0.250707 0.439284 1.000000 0.413904
Google 0.562611 0.522914 0.601770 0.413904 1.000000

Closing Price Correlation Plot:

The following plot is extremely visual as well as interactive. This matrix like graph has the companies listen on the left border and the bottom border. We have calculated the closing price correlation by using the .pct_change() function provided to us. Hence, we created a new dataframe that was responsible for storing these values which will then be used to make a matrix plot that will then be filled with colors that determine the correlation between two different companies and their stock prices. This correlation helps users to understand how rise and fall in stock prices in companies can also be used to see the rise and fall in prices for other companies in the same sector. We have provided a legend that will allow users to read this graph more efficiently and also understand it better.

ASSUMPTION:

We noticed that the Open, Close, High and Low prices were fairly similar also, when it comes to actually buying and selling stock, traders normally pick based on close price. This is why from this point forward, we've analyzed stocks based on their Close Price.

In [31]:
fig = px.imshow(corr)

fig.update_layout(title='Correlation between All FAANG Stocks Close Price')

iplot(fig,show_link=False)

Graph for Closing prices for FAANG Stocks from 2013 to 2020:

The graph below, as you can see, represents the closing prices for the stocks of FAANG companies. One of the most fascinating and useful feature about this graph is the fact that it is interactive. This graph allows the user to select a particular timeframe in the given range of dates and times and find the exact day, date and time what the closing prices of the concerned stock is. Furthermore, if the user finds it a little hard to understand this graph due to five different scatterplots in one, we have added functionality for the user to select the closing price of only one of these stocks which will make it easier to study the given graph. We have selected the closing price of each stock for every month in the years 2013 - 2020, by choosing if from the dataframe we created. After finding the closing value we went ahead and used different functions provided by plotly in order to make this interactive graph.

In [32]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=facebook.Date, y=facebook.Close, name='FB'))
fig.add_trace(go.Scatter(x=apple.Date, y=apple.Close, name='AAPL'))
fig.add_trace(go.Scatter(x=amazon.Date, y=amazon.Close, name='AMZN'))
fig.add_trace(go.Scatter(x=netflix.Date, y=netflix.Close, name='NFLX'))
fig.add_trace(go.Scatter(x=google.Date, y=google.Close, name='GOOG'))

fig.update_layout(title='Close prices for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Facebook',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False]},
                          {'title': 'FB',
                           'showlegend':True}]),
             dict(label = 'Apple',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False]},
                          {'title': 'APPL',
                           'showlegend':True}]),
             dict(label = 'Amazon',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False]},
                          {'title': 'AMZN',
                           'showlegend':True}]),
             dict(label = 'Netflix',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False]},
                          {'title': 'NFLX',
                           'showlegend':True}]),
             dict(label = 'Google',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True]},
                          {'title': 'GOOG',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Standardizing Closing Prices for Stocks:

The prices of individual investment securities can vary widely and thus a common reporting practice is to standardize or index these values to a baseline value. Hence, in this section we have standardized the closing prices for the stocks of each FAANG company. Furthermore, we have made an interactive graph that determines the relationship between these stocks by taking into consideration the standardized closing values. The user can hover over the graph and get the values of closing prices for each stock by using the “Compare Data on Hover” function of the graph.

In [33]:
fig = px.line(result, x="Date", y="standard_close", color='Company')

fig.update_layout(title='Standardized Close prices for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Standardized Close Price', template="plotly_dark")

iplot(fig,show_link=False)

Standardized Volume Graphs Grouped by Year:

Volume measures the number of shares traded in a stock or contracts traded in futures or options. Volume can be an indicator of market strength, as rising markets on increasing volume are typically viewed as strong and healthy. In this part we have plotted a histogram that represents a comparative study of the volumes of stocks for each company from 2013 - 2020.

In [34]:
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])

for x, rows in result.iterrows():
    result.loc[x, 'Year'] = rows['Date'].year

comp = result.groupby(['Company', 'Year'])

vol_df = pd.DataFrame()
vol = []
company = []
year = []

x = 0

for key,val in comp:
    a,b = key
    company.append(a)
    year.append(b)
    vol.append(comp.get_group(key).mean()['Volume'])

vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol

fig = go.Figure()

avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()

vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)

for x, rows in vol_df.iterrows():
    vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)
    
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], y=vol_df[vol_df['Company'] == 'Facebook']['standard_vol'], name='FB'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], y=vol_df[vol_df['Company'] == 'Apple']['standard_vol'], name='AAPL'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], y=vol_df[vol_df['Company'] == 'Amazon']['standard_vol'], name='AMZN'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], y=vol_df[vol_df['Company'] == 'Netflix']['standard_vol'], name='NFLX'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], y=vol_df[vol_df['Company'] == 'Google']['standard_vol'], name='GOOG'))

fig.update_layout(title='Standardized Volume for All Companies from Jan 2013 to Aug 2020 Grouped by Year',
                   xaxis_title='Date',
                   yaxis_title='Standard Volume', template="plotly_dark")

iplot(fig,show_link=False)

Analysis, Hypothesis Testing and Machine Learning -

Fitted Regression Graphs:

This set of graphs represent the fitted regression models for stock prices of FAANG companies. This is where machine learning comes into use, we have used two sets of data to feed to this algorithm. These are training and testing values. The training values are those values which are responsible for making the machine understand patterns in the data and also improve the efficiency and accuracy of the algorithm. Consequently, the testing data here is used to check how well the algorithm can predict new answers based on its training. We have plotted regression lines to fit this data and find the best fitting method out of Linear, k-NN and Decision Tree Regression.

Fitted Regression Graph for Facebook:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Facebook. It also shows us a line of best fit in the graph which makes it easier to read.

In [35]:
facebook['timestamp'] = pd.to_datetime(facebook.Date).astype(int) // (10**9)
X = np.array(facebook['timestamp']).reshape(-1,1)
y = np.array(facebook['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Facebook from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [36]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.856
Model:                            OLS   Adj. R-squared (uncentered):              0.856
Method:                 Least Squares   F-statistic:                          1.145e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        21:43:12   Log-Likelihood:                         -10322.
No. Observations:                1921   AIC:                                  2.065e+04
Df Residuals:                    1920   BIC:                                  2.065e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          8.612e-08   8.05e-10    107.007      0.000    8.45e-08    8.77e-08
==============================================================================
Omnibus:                      372.431   Durbin-Watson:                   0.003
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               76.141
Skew:                           0.017   Prob(JB):                     2.93e-17
Kurtosis:                       2.025   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Facebook:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Facebook at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Facebook

Alternative Hypothesis: There is relationship between time and Close Price for Facebook

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Facebook.

Fitted Regression Graph for Apple:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Apple. It also shows us a line of best fit in the graph which makes it easier to read.

In [37]:
apple['timestamp'] = pd.to_datetime(apple.Date).astype(int) // (10**9)
X = np.array(apple['timestamp']).reshape(-1,1)
y = np.array(apple['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Apple from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [38]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.823
Model:                            OLS   Adj. R-squared (uncentered):              0.823
Method:                 Least Squares   F-statistic:                              8980.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        21:43:12   Log-Likelihood:                         -8302.5
No. Observations:                1931   AIC:                                  1.661e+04
Df Residuals:                    1930   BIC:                                  1.661e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          2.599e-08   2.74e-10     94.764      0.000    2.54e-08    2.65e-08
==============================================================================
Omnibus:                      675.510   Durbin-Watson:                   0.002
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2247.580
Skew:                           1.755   Prob(JB):                         0.00
Kurtosis:                       6.951   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Apple:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Apple at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Apple

Alternative Hypothesis: There is relationship between time and Close Price for Apple

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Apple.

Fitted Regression Graph for Amazon:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Amazon. It also shows us a line of best fit in the graph which makes it easier to read.

In [39]:
amazon['timestamp'] = pd.to_datetime(amazon.Date).astype(int) // (10**9)
X = np.array(amazon['timestamp']).reshape(-1,1)
y = np.array(amazon['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Amazon from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [40]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.716
Model:                            OLS   Adj. R-squared (uncentered):              0.716
Method:                 Least Squares   F-statistic:                              4845.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        21:43:13   Log-Likelihood:                         -15159.
No. Observations:                1919   AIC:                                  3.032e+04
Df Residuals:                    1918   BIC:                                  3.033e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1           7.01e-07   1.01e-08     69.603      0.000    6.81e-07    7.21e-07
==============================================================================
Omnibus:                      180.785   Durbin-Watson:                   0.001
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              234.645
Skew:                           0.855   Prob(JB):                     1.12e-51
Kurtosis:                       2.908   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Amazon:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Amazon at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Amazon

Alternative Hypothesis: There is relationship between time and Close Price for Amazon

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Amazon.

Fitted Regression Graph for Netflix:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Netflix. It also shows us a line of best fit in the graph which makes it easier to read.

In [41]:
netflix['timestamp'] = pd.to_datetime(netflix.Date).astype(int) // (10**9)
X = np.array(netflix['timestamp']).reshape(-1,1)
y = np.array(netflix['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Netflix from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [42]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.688
Model:                            OLS   Adj. R-squared (uncentered):              0.688
Method:                 Least Squares   F-statistic:                              4218.
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        21:43:13   Log-Likelihood:                         -11892.
No. Observations:                1910   AIC:                                  2.379e+04
Df Residuals:                    1909   BIC:                                  2.379e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          1.231e-07   1.89e-09     64.949      0.000    1.19e-07    1.27e-07
==============================================================================
Omnibus:                      396.692   Durbin-Watson:                   0.002
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              215.406
Skew:                           0.684   Prob(JB):                     1.68e-47
Kurtosis:                       2.087   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Netflix:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Netflix at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Netflix

Alternative Hypothesis: There is relationship between time and Close Price for Netflix

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Netflix.

Fitted Regression Graph for Google:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Google. It also shows us a line of best fit in the graph which makes it easier to read.

In [43]:
google['timestamp'] = pd.to_datetime(google.Date).astype(int) // (10**9)
X = np.array(google['timestamp']).reshape(-1,1)
y = np.array(google['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Google from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [44]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.910
Model:                            OLS   Adj. R-squared (uncentered):              0.910
Method:                 Least Squares   F-statistic:                          1.965e+04
Date:                Sun, 20 Dec 2020   Prob (F-statistic):                        0.00
Time:                        21:43:13   Log-Likelihood:                         -13599.
No. Observations:                1934   AIC:                                  2.720e+04
Df Residuals:                    1933   BIC:                                  2.721e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          5.899e-07   4.21e-09    140.167      0.000    5.82e-07    5.98e-07
==============================================================================
Omnibus:                      284.325   Durbin-Watson:                   0.003
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              108.466
Skew:                           0.374   Prob(JB):                     2.80e-24
Kurtosis:                       2.114   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Google:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Google at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Google Alternative Hypothesis: There is relationship between time and Close Price for Google

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Google.

Predicting Values for FAANG stocks:

In this section we will use Tree predictors, Linear predictors and k-NN predictors in order to predict the different values of any possible or new data points. Here, we will be predicting the new data points for closing values. This graph is an interactive plot and also has different “modes”, the user can select a different technique for predicting values by using the drop down menu right above the graph. Each mode will give a new graph that shows the actual value and predicted value of the closing price for FAANG stocks.

Predicting Closing Values for Facebook:

The graph below depicts the actual and predicted closing prices for Facebook. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Facebook stocks.

In [45]:
df = facebook[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree', 
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg', 
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Facebook For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Facebook:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8749537040637907, Decision Tree Regressor has an accuracy score of 0.7799458755648638, and Linear Regressor has an accuracy score of 0.7997626275855734. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [46]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.9058651124346104
Decision Tree Regressor Accuracy Score: 0.8576399321043068
Linear Regressor Accuracy Score: 0.8451303497756099

Predicting Closing Values for Apple:

The graph below depicts the actual and predicted closing prices for Apple. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Apple stocks.

In [47]:
df = apple[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Apple For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Apple:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8810877897308647, Decision Tree Regressor has an accuracy score of 0.8153372738432569, and Linear Regressor has an accuracy score of 0.7450194573323083. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [48]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.8696615958078733
Decision Tree Regressor Accuracy Score: 0.792889770103356
Linear Regressor Accuracy Score: 0.7397621094500759

Predicting Closing Values for Amazon:

The graph below depicts the actual and predicted closing prices for Amazon. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Amazon stocks.

In [49]:
df = amazon[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Amazon For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Amazon:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.9481650428008439, Decision Tree Regressor has an accuracy score of 0.9183419691660628, and Linear Regressor has an accuracy score of 0.8465830403947987. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [50]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.9555637970164639
Decision Tree Regressor Accuracy Score: 0.9241870430423782
Linear Regressor Accuracy Score: 0.8458883117616638

Predicting Closing Values for Netflix:

The graph below depicts the actual and predicted closing prices for Netflix. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Netflix stocks.

In [51]:
df = netflix[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Netflix For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Netflix:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8285444089046352, Decision Tree Regressor has an accuracy score of 0.7158090364057252, and Linear Regressor has an accuracy score of 0.6464066460021082. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [52]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.8789943462120081
Decision Tree Regressor Accuracy Score: 0.7764276099054495
Linear Regressor Accuracy Score: 0.6810040347105134

Predicting Closing Values for Google:

The graph below depicts the actual and predicted closing prices for Google. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Google stocks.

In [53]:
df = google[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))


predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Google For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Google:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.9285098507660844, Decision Tree Regressor has an accuracy score of 0.8827985730839002, and Linear Regressor has an accuracy score of 0.8891613969513601. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [54]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.9308921932729871
Decision Tree Regressor Accuracy Score: 0.8859263989100259
Linear Regressor Accuracy Score: 0.8797519314737576

Inference from Predictive Graphs:

The inference we can draw from this graph is that the k-NN Regressor model gives us one of the most accurate predictions looking at the accuracy scores. Although, there are a couple outliers in the graphs that could possibly change the predictions. However, these outliers happen to be very insignificant and can always change based on the trading model that companies decide to adopt.

Insights and Policy Decision